Feat/optimize html streaming by prk-Jr · Pull Request #351 · IABTechLab/trusted-server

prk-Jr · 2026-02-20T18:50:18Z

Summary

This PR combines the core publisher-proxy streaming optimization with the
Next.js RSC follow-up work.

At the platform level, Trusted Server moves from a fully buffered proxy model
to chunked streaming using Fastly stream_to_client(), enabling early header
flush and incremental HTML delivery to reduce TTFB and improve subresource
discovery.

On top of that foundation, the HTML pipeline now supports RSC-aware lazy
accumulation: non-RSC content continues to stream immediately, while only RSC
content that requires post-processing is buffered and rewritten safely. This
preserves correctness for fragmented/cross-script RSC payloads while restoring
meaningful streaming behavior.

Key Changes

stream_to_client() Integration (publisher.rs)
Replaced fully buffered response collection with stream_to_client() to
enable immediate header dispatch and incremental chunk streaming.
lol_html Output Pipeline (streaming_processor.rs)
Refactored the HtmlRewriter adapter to implement the OutputSink trait
with a shared Rc<RefCell<Vec<u8>>>, enabling true incremental streaming.
Buffer Pre-allocation
Replaced std::mem::take with Vec::with_capacity and
std::mem::replace to reduce reallocation churn during chunk processing.
WASM Hostcall Batching
Wrapped StreamingBody output in an 8KB std::io::BufWriter to reduce
WASM-to-host boundary crossings.
RSC Lazy Accumulation (html_processor.rs)
Added conditional accumulation mode that starts buffering only when
post-processing is required (for example, RSC placeholders or fragmented
scripts). Non-RSC pages continue streaming instead of being fully buffered.
RSC Post-processing Triggers (nextjs integration)
Added needs_accumulation support to integration post-processors and
needs_post_processing detection in placeholder state, including fragmented
script tracking for fallback re-parse correctness.
Memory Safety Guardrail
Added a 10MB cap for accumulated post-processed HTML to avoid unbounded
memory growth on large/malicious documents.
Routing and Header Consistency (fastly/src/main.rs, publisher.rs)
Centralized route classification and standardized response-header application
across buffered and streaming paths.
RSC Fixture/Test Expansion
Added fixture-driven Next.js integration tests (including real Next.js output)
plus a dedicated example app and scripts for fixture capture and live streaming
validation.
Code Health
Resolved associated Clippy warnings and added missing # Errors
documentation in streaming-related handlers.

Test Plan

Local Unit & Workspace Tests
Run:
```
cargo test --workspace
```
TypeScript Bundle Build
Run:
```
npm run build
```
in crates/js/lib to verify successful generation of integration modules.
Next.js RSC Integration Tests
Run:
```
cargo test --test nextjs_integration -- --nocapture
```
to validate URL rewriting correctness and streaming behavior across fixture
sets/chunk sizes.
Local Fastly Simulation
Run:
```
fastly compute serve
```
Verify:
- Headers are correctly injected on streamed responses
- Proxy behavior remains correct
- Baseline TTFB improvements (for example, via curl)
Staging Load Testing
Execute:
```
./scripts/benchmark.sh
```
against staging to quantify external TTFB and Time-to-Last-Byte (TTLB)
improvements under concurrent traffic.

Closes

closes #320

Introduce RequestTimer for per-request phase tracking (init, backend, process, total) exposed via Server-Timing response headers. Add benchmark tooling with --profile mode for collecting timing data. Document phased optimization plan covering streaming architecture, code-level fixes, and open design questions for team review.

…Info

RequestTimer and Server-Timing header were premature — WASM guest profiling via profile.sh gives better per-function visibility without runtime overhead. Also strips dead --profile mode from benchmark.sh.

build.rs already resolves trusted-server.toml + env vars at compile time and embeds the result. Replace Settings::from_toml() with direct toml::from_str() to skip the config crate pipeline on every request. Profiling confirms: ~5-8% → ~3.3% CPU per request.

- OPTIMIZATION.md: profiling results, CPU breakdown, phased optimization plan covering streaming fixes, config crate elimination, and stream_to_client() architecture - scripts/profile.sh: WASM guest profiling via --profile-guest with Firefox Profiler-compatible output - scripts/benchmark.sh: TTFB analysis, cold start detection, endpoint latency breakdown, and load testing with save/compare support

…ding HTML and RSC Flight URL rewriting, to avoid full-body buffering

…ntation

prk-Jr · 2026-02-23T08:31:21Z

Performance Benchmark: HTML Streaming Optimization

We ran a comprehensive apples-to-apples benchmark to measure the impact of the feat/optimize-html-streaming branch (which introduces lol_html for streaming <body> transformations instead of buffering).

To ensure statistical accuracy:

We increased the sample size to 50 requests.
We ran a 10-request deep warmup to eliminate cold-start WASM instantiations.
We tested both branches on Staging and Production directly to isolate environment variables.

🚀 The Results: Production

This is the true impact on live users hitting the Fastly Edge:

Metric	Baseline (main)	Optimization (`feat/...`)	Net Impact
First Byte (Median TTFB)	160.09 ms	144.75 ms	🟢 15.34 ms Faster
First Byte (p95 Tail)	252.78 ms	228.93 ms	🟢 23.85 ms Faster
Total Transfer Time	315.19 ms	345.20 ms	🔶 30.01 ms Slower

📉 The Results: Staging

(Note: Total times are higher here because Staging serves a 190KB uncompressed JS bundle, whereas Prod serves a minified 28KB bundle).

Metric	Baseline (main)	Optimization (`feat/...`)	Net Impact
First Byte (Median TTFB)	220.44 ms	217.38 ms	🟢 3.06 ms Faster
First Byte (p95 Tail)	478.48 ms	277.93 ms	🟢 200.55 ms Faster
Total Transfer Time	505.28 ms	654.58 ms	🔶 149.30 ms Slower

🎯 Conclusion

The lol_html streaming processor behaves exactly as architecturally intended:

Massive Win for Core Web Vitals: Because we no longer wait for the backend to buffer the entire HTML document before transmitting, the Fastly edge begins sending the <head> tag to the user's browser 15ms sooner on average (and up to 200ms sooner in worst-case staging scenarios). This means the browser can start downloading CSS/JS assets much faster.
Acceptable CPU Overhead: Streaming chunk-by-chunk through the WASM boundary does consume more CPU time. On production hardware, this means the page finishes loading about 30ms later.

Exchanging 30ms of trailing transfer time for 15-20ms of upfront TTFB savings is a highly favorable trade for perceived performance. This branch is safe and recommended for merge.

…reaming

* Optimize Next.js RSC streaming with lazy accumulation Implement lazy buffering that delays accumulation until RSC content is detected, improving streaming from 0% to 28-37% for RSC pages while maintaining 100% URL rewriting correctness. - Add needs_accumulation() trait for conditional buffering - Add 10MB memory limit for DoS protection - Create integration test suite with real Next.js fixtures - Add example Next.js app for testing Performance: RSC pages stream 28-37% (theoretical max), non-RSC 96%. * Preserve publisher fallback headers, centralize route classification, and always clean up live test temp files

prk-Jr and others added 11 commits February 18, 2026 21:33

Downgrade Settings debug dump to log::debug and tighten max_level to …

4d90469

…Info

Remove Server-Timing instrumentation and clean up benchmark tooling

01d77e4

RequestTimer and Server-Timing header were premature — WASM guest profiling via profile.sh gives better per-function visibility without runtime overhead. Also strips dead --profile mode from benchmark.sh.

Removed blank line

e49f9f7

Introduce streaming response processing for publisher requests, inclu…

9d14d3f

…ding HTML and RSC Flight URL rewriting, to avoid full-body buffering

Update OPTIMIZATION.md with streaming architecture progress

5b93249

Merge branch 'main' into feat/optimize-html-streaming

04e4e83

Add clippy attribute to and improve streaming request handling docume…

63a089b

…ntation

prk-Jr self-assigned this Feb 20, 2026

prk-Jr linked an issue Feb 23, 2026 that may be closed by this pull request

Enable Streaming Chunks for responses to improve TTFB on TS #320

Open

prk-Jr and others added 4 commits February 23, 2026 20:39

Inject staging identification headers for streamed responses

6701a99

Merge remote-tracking branch 'origin/main' into feat/optimize-html-st…

3503629

…reaming

Merge branch 'main' into feat/optimize-html-streaming

f3d6c53

prk-Jr marked this pull request as ready for review February 25, 2026 16:32

Remove legacy buffered path and clean up review findings

ce2b2b4

aram356 marked this pull request as draft February 26, 2026 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/optimize html streaming#351

Feat/optimize html streaming#351
prk-Jr wants to merge 16 commits intomainfrom
feat/optimize-html-streaming

prk-Jr commented Feb 20, 2026 •

edited

Loading

Uh oh!

prk-Jr commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

prk-Jr commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Test Plan

Closes

Uh oh!

prk-Jr commented Feb 23, 2026

Performance Benchmark: HTML Streaming Optimization

🚀 The Results: Production

📉 The Results: Staging

🎯 Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

prk-Jr commented Feb 20, 2026 •

edited

Loading